NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

L-DQN: An Asynchronous Limited-Memory Distributed Quasi-Newton Method

https://doi.org/10.1109/CDC45484.2021.9682985

Can, Bugra; Soori, Saeed; Dehnavi, Maryam Mehri; Gurbuzbalaban, Mert (December 2021, 2021 60th IEEE Conference on Decision and Control (CDC))

Full Text Available
HyLo: a hybrid low-rank natural gradient descent method

Mu, Baorun; Soori, Saeed; Can, Bugra; Gurbuzbalaban, Mert; Dehnavi, Maryam Mehri (January 2022, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis)

This work presents a Hybrid Low-Rank Natural Gradient Descent method, called HyLo, that accelerates the training time of deep neural networks. Natural gradient descent (NGD) requires computing the inverse of the Fisher information matrix (FIM), which is typically expensive at large-scale. Kronecker factorization methods such as KFAC attempt to improve NGD's running time by approximating the FIM with Kronecker factors. However, the size of Kronecker factors increases quadratically as the model size grows. Instead, in HyLo, we use the Sherman-Morrison-Woodbury variant of NGD (SNGD) and propose a reformulation of SNGD to resolve its scalability issues. HyLo uses a computationally-efficient low-rank factorization to achieve superior timing for Fisher inverses. We evaluate HyLo on large models including ResNet-50, U-Net, and ResNet-32 on up to 64 GPUs. HyLo converges 1.4×-2.1× faster than the state-of-the-art distributed implementation of KFAC and reduces the computation and communication time up to 350× and 10.7× on ResNet-50.
more » « less
Full Text Available
Randomized Gossiping with Effective Resistance Weights: Performance Guarantees and Applications

https://doi.org/10.1109/TCNS.2022.3161201

Can, Bugra; Gurbuzbalaban, Mert; Aybat, Necdet Serhat; Soori, Saeed; Mehri Dehnavi, Maryam (January 2022, IEEE Transactions on Control of Network Systems)

Full Text Available
ASYNC: A Cloud Engine with Asynchrony and History for Distributed Machine Learning

https://doi.org/10.1109/IPDPS47924.2020.00052

Soori, Saeed; Can, Bugra; Gurbuzbalaban, Mert; Dehnavi, Maryam Mehri (May 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS))
null (Ed.)
ASYNC is a framework that supports the implementation of asynchrony and history for optimization methods on distributed computing platforms. The popularity of asynchronous optimization methods has increased in distributed machine learning. However, their applicability and practical experimentation on distributed systems are limited because current bulk-processing cloud engines do not provide a robust support for asynchrony and history. With introducing three main modules and bookkeeping system-specific and application parameters, ASYNC provides practitioners with a framework to implement asynchronous machine learning methods. To demonstrate ease-of-implementation in ASYNC, the synchronous and asynchronous variants of two well-known optimization methods, stochastic gradient descent and SAGA, are demonstrated in ASYNC.
more » « less
Full Text Available
Accelerated Linear Convergence of Stochastic Momentum Methods in Wasserstein Distances

Can, Bugra; Gurbuzbalaban, Mert; Zhu, Lingjiong (July 2019, Proceedings of Machine Learning Research)

Momentum methods such as Polyak's heavy ball (HB) method, Nesterov's accelerated gradient (AG) as well as accelerated projected gradient (APG) method have been commonly used in machine learning practice, but their performance is quite sensitive to noise in the gradients. We study these methods under a first-order stochastic oracle model where noisy estimates of the gradients are available. For strongly convex problems, we show that the distribution of the iterates of AG converges with the accelerated linear rate to a ball of radius " centered at a unique invariant distribution in the 1-Wasserstein metric where is the condition number as long as the noise variance is smaller than an explicit upper bound we can provide. Our analysis also certifies linear convergence rates as a function of the stepsize, momentum parameter and the noise variance; recovering the accelerated rates in the noiseless case and quantifying the level of noise that can be tolerated to achieve a given performance. To the best of our knowledge, these are the first linear convergence results for stochastic momentum methods under the stochastic oracle model. We also develop finer results for the special case of quadratic objectives, extend our results to the APG method and weakly convex functions showing accelerated rates when the noise magnitude is sufficiently small.
more » « less
Full Text Available

Search for: All records